Segmenting DNA sequence into words based on statistical language model
نویسندگان
چکیده
منابع مشابه
Segmenting DNA sequence into 'words' based on statistical language model
[Abstract] This paper presents a novel method to segment/decode DNA sequences based on n-gram statistical language model. Firstly, we find the length of most DNA “words” is 12 to 15 bps by analyzing the genomes of 12 model species. The bound of language entropy of DNA sequence is about 1.5674 bits. After building an n-gram biology languages model, we design an unsupervised ‘probability approach...
متن کاملSegmenting DNA sequence into `words'
[Abstract] This paper presents a novel method to segment/decode DNA sequences based on statistical language model. Firstly, we find the length of most DNA “words” is 12 to 15 bps by analyzing the genomes of 12 model species. Then we apply the unsupervised approach to build the DNA vocabulary and design DNA sequence segmentation method. We also find different genomes is likely to use the similar...
متن کاملWords Reordering based on Statistical Language Model
There are multiple reasons to expect that detecting the word order errors in a text will be a difficult problem, and detection rates reported in the literature are in fact low. Although grammatical rules constructed by computer linguists improve the performance of grammar checker in word order diagnosis, the repairing task is still very difficult. This paper presents an approach for repairing w...
متن کاملmortality forecasting based on lee-carter model
over the past decades a number of approaches have been applied for forecasting mortality. in 1992, a new method for long-run forecast of the level and age pattern of mortality was published by lee and carter. this method was welcomed by many authors so it was extended through a wider class of generalized, parametric and nonlinear model. this model represents one of the most influential recent d...
15 صفحه اولA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Nature Precedings
سال: 2012
ISSN: 1756-0357
DOI: 10.1038/npre.2012.6939.1